simulator interaction
df42e2244c97a0d80d565ae8176d3351-Supplemental.pdf
Freeway is excluded from this table as Junyent et al. [ Epochs 8 Loss Function for Policy Categorical crossentropy Loss Function for Value Function Huber Discount factor used in TD Learning 0.99 Time steps between target network updates (for value network) 10,000 Interval size of learning schedule Due to computational restraints we could not tune the hyperparameters of N-CPL.
Width-based Lookaheads with Learnt Base Policies and Heuristics Over the Atari-2600 Benchmark
O'Toole, Stefan, Lipovetzky, Nir, Ramirez, Miquel, Pearce, Adrian
We propose new width-based planning and learning algorithms inspired from a careful analysis of the design decisions made by previous width-based planners. The algorithms are applied over the Atari-2600 games and our best performing algorithm, Novelty guided Critical Path Learning (N-CPL), outperforms the previously introduced width-based planning and learning algorithms $\pi$-IW(1), $\pi$-IW(1)+ and $\pi$-HIW(n, 1). Furthermore, we present a taxonomy of the Atari-2600 games according to some of their defining characteristics. This analysis of the games provides further insight into the behaviour and performance of the algorithms introduced. Namely, for games with large branching factors, and games with sparse meaningful rewards, N-CPL outperforms $\pi$-IW, $\pi$-IW(1)+ and $\pi$-HIW(n, 1).
- Leisure & Entertainment > Sports (0.68)
- Leisure & Entertainment > Games > Computer Games (0.47)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.93)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)